{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "\n",
    "Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle).\n",
    "\n",
    "To let users get started quickly, the main purpose of this tutorial is:\n",
    "* Understand how a graph network is calculated based on PGL.\n",
    "\n",
    "* Use PGL to implement a simple graph neural network model, which is used to classify the nodes in the graph.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: using PGL to create a graph \n",
    "Suppose we have a graph with 10 nodes and 14 edges as shown in the following figure:\n",
    "![A simple graph](./quick_start_graph.png)\n",
    "\n",
    "Our purpose is to train a graph neural network to classify yellow and green nodes. So we can create this graph in such way:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "import paddle\n",
    "import paddle.nn as nn\n",
    "import paddle.nn.functional as F\n",
    "from paddle.optimizer import Adam\n",
    "import pgl\n",
    "\n",
    "\n",
    "def build_graph():\n",
    "    # define the number of nodes; we can use number to represent every node\n",
    "    num_node = 10\n",
    "    # add edges, we represent all edges as a list of tuple (src, dst)\n",
    "    edge_list = [(2, 0), (2, 1), (3, 1),(4, 0), (5, 0), \n",
    "             (6, 0), (6, 4), (6, 5), (7, 0), (7, 1),\n",
    "             (7, 2), (7, 3), (8, 0), (9, 7)]\n",
    "\n",
    "    # Each node can be represented by a d-dimensional feature vector, here for simple, the feature vectors are randomly generated.\n",
    "    d = 16\n",
    "    feature = np.random.randn(num_node, d).astype(\"float32\")\n",
    "    # each edge has it own weight\n",
    "    edge_feature = np.random.randn(len(edge_list), 1).astype(\"float32\")\n",
    "    \n",
    "    # create a graph\n",
    "    g = pgl.Graph(edges = edge_list,\n",
    "                  num_nodes = num_node,\n",
    "                  node_feat = {'nfeat':feature}, \n",
    "                  edge_feat ={'efeat': edge_feature})\n",
    "\n",
    "    return g\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = build_graph()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After creating a graph in PGL, we can print out some information in the graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('There are %d nodes in the graph.'%g.num_nodes)\n",
    "print('There are %d edges in the graph.'%g.num_edges)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: create a simple Graph Convolutional Network(GCN)\n",
    "\n",
    "In this tutorial, we use a simple Graph Convolutional Network(GCN) developed by [Kipf and Welling](https://arxiv.org/abs/1609.02907) to perform node classification. Here we use the simplest GCN structure. If you want to know more about GCN, you can refer to the original paper.\n",
    "\n",
    "* In layer $l$，each node $u_i^l$ has a feature vector $h_i^l$;\n",
    "* In every layer,  the idea of GCN is that the feature vector $h_i^{l+1}$ of each node $u_i^{l+1}$ in the next layer are obtained by weighting the feature vectors of all the neighboring nodes and then go through a non-linear transformation.  \n",
    "\n",
    "In PGL, we can easily implement a GCN layer as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class GCN(nn.Layer):\n",
    "    \"\"\"Implement of GCN\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self,\n",
    "                 input_size,\n",
    "                 num_class,\n",
    "                 num_layers=2,\n",
    "                 hidden_size=16,\n",
    "                 **kwargs):\n",
    "        super(GCN, self).__init__()\n",
    "        self.num_class = num_class\n",
    "        self.num_layers = num_layers\n",
    "        self.hidden_size = hidden_size\n",
    "        self.gcns = nn.LayerList()\n",
    "        for i in range(self.num_layers):\n",
    "            if i == 0:\n",
    "                self.gcns.append(\n",
    "                    pgl.nn.GCNConv(\n",
    "                        input_size,\n",
    "                        self.hidden_size,\n",
    "                        activation=\"relu\",\n",
    "                        norm=True))\n",
    "            else:\n",
    "                self.gcns.append(\n",
    "                    pgl.nn.GCNConv(\n",
    "                        self.hidden_size,\n",
    "                        self.hidden_size,\n",
    "                        activation=\"relu\",\n",
    "                        norm=True))\n",
    "                \n",
    "        self.output = nn.Linear(self.hidden_size, self.num_class)\n",
    "    def forward(self, graph, feature):\n",
    "        for m in self.gcns:\n",
    "            feature = m(graph, feature)\n",
    "        logits = self.output(feature)\n",
    "        return logits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3:  data preprocessing\n",
    "\n",
    "Since we implement a node binary classifier, we can use 0 and 1 to represent two classes respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = [0,1,1,1,0,0,0,1,0,1]\n",
    "label = np.array(y, dtype=\"float32\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4:  training\n",
    "\n",
    "The training process of GCN is the same as that of other paddle-based models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = g.tensor()\n",
    "y = paddle.to_tensor(y)\n",
    "gcn = GCN(16, 2)\n",
    "criterion = paddle.nn.loss.CrossEntropyLoss()\n",
    "optim = Adam(learning_rate=0.01, \n",
    "             parameters=gcn.parameters())\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gcn.train()\n",
    "for epoch in range(30):\n",
    "    logits = gcn(g, g.node_feat['nfeat'])\n",
    "    loss = criterion(logits, y)\n",
    "    loss.backward()\n",
    "    optim.step()\n",
    "    optim.clear_grad()\n",
    "    print(\"epoch: %s | loss: %.4f\" % (epoch, loss.numpy()[0]))\n",
    "    "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
